CM 
< 

06 

GO 

o 

Q. 
UJ 



1^ EuropSisches Patentamt 

Office europeen des brevets 



Appl. No. 09/823,977 
1 Doc. Ref. AMI 

llliniiiiiiiiiiDiiiiDiini 



(12) 



(11) EP 0 818 744 A2 

EUROPEAN PATENT APPLICATION 



(43) Date of publication: 

14.01.1998 Bulletin 1998/03 

(21) Application number: 97304412.6 

(22) Dateof filing: 24.06.1997 



(51) intciA G06F 17/50, C07K 1/00, 
C07K5/08, C12N 9/74 



(84) Designated Contracting States: 


(72) Inventors: 


AT BE CH DE OK ES Fl FR GB GR IE IT LI LU MC 


• Young, Stephen Clinton 


NL PT SE 


Stockport, Cheshire SK4 4DL (GB) 


Designated Extension States: 


• Murray, Christopher 


AL LT LV RO SI 


Macclesfield, Cheshire SK10 2TT (GB) 


(30) Pribrity:-08;07.1996 GB 9614302 


(74) Representative: Cockbain, Julian, Dr. 


07.08.1996 QB 9616562 


Frank B. Dehn & Co., 




European Patent Attorneys, 


(71) Applicant: Proteus Molecular Design Limited 


179 Queen Victoria Street 


Macclesfield, Cheshire SK11 OJL (GB) 


London EC4V4EL (GB) 



(54) Process for selecting candidate drug compounds 



(57) The invention relates to a process for drug can- 
didate identification, said process comprising the steps 
of: 

(1 ) obtaining a computerised representation of the 
three-dimensional structure of a binding site on the 
surface of a biological macromolecule; 

(2) generating a computerised model of the func- 
tional structure of said binding site which may be 
used to identify favourable and unfavourable inter- 
actions between the binding site and a drug candi- 
date molecule; 

(3) identifying a molecular fragment (or •template* 
T) capable of placement within said binding site and 
capable of carrying at least one (preferably a plu- 
rality (ie. at least two) arul especially preferably at 
least 3) substituent group, said molecular fragment 
either being capable of being synthesized from re- 
agent compounds accessible in substituted form 
whereby to import sakJ substituent-groups on syn- 
thesis of said molecular fragment or being present 
in an accessible reagent compound capable of sub- 
stitution with said substituent groups by reaction 
with further accessible reagent compounds; 

(4) generating a set of lists of accetssibte reagent 
compounds (eg. a^-A, a2-A, a3-A. etc. b^-B, b2-B, 
b3-B, etc. Ci-C. C2-C, C3-C. etc), the lists being such 
that a combination of compounds taken from each 
list (eg. a^ -A, b3-B and c^^ -C) may be reacted to pro- 
duce a candidate compound comprising said mo- 



lecular fragment carrying a plurality of substituent 
groups (eg. a^b3 C^^T) thereby generating a first vir- 
tual library of candidate compounds being the the- 
oretical set of compounds producible by reaction of 
the members of said lists (ie. aib^c^T, aibiC2T, 
^1^2^! ^ 3^^^ member of each list comprising 
a component (eg. A. B.C. etc.) common to the other 
members of that list and a component (eg/a^. b^. 
c^, etc) unique within that list; 

(5) for each said list limiting the number of members 
thereof using a first set of exclusion rules thereby 
to generate a restricted second virtual library of can- 
didate compounds, the operation of said first set of 
mles involving for each member of each list com- 
puterised comparison for favourable or unfavoura- 
ble interactions between said computerised model 
and a structure corhprising said molecular fragment 
and a substituent deriving from the unique compo- 
nent within said list of that-m©nrrt)er. the nx)iecular 
fragment arid the computerized model being held \n 
fixed spatial relationship to each other for sakJ com- 
parison; 

(6) evaluating and ranking by computer the mem- 
bers of said secortd virtual library for favourable and 
unfavourable ffiteractions with said computerised 
rriodel and thereby generating a restricted third vir- 
tual library of candidate compounds ranked as hav- 
ing favourable interactions; 

(7) optionally, selecting from said third virtual library 
at least one further molecular fragment and repeat- 
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ing steps (4), (5) and (6) to generate an alternative 
third virtual library; 

(8) screening said third virtual library using a second 
set of exclusion rules thereby to generate a restrict- 
ed fourth virtual library of candidate compounds 
comprising compounds which are candidates for 
synthesis and experimental evaluation for drug ef- 
ficacy; 

(9) synthesizing some or ail candidate compounds 
of said fourth virtual library to produce a candidate 
compound library; 

(10) experimentally evaluating the compounds of 
said candidate compound library for drug efficacy; 

(11) analysing the experimental efficacy data gen- 
erated in step (10) for structure-activity relationship 
information; 

(12) using the information derived in step (11) se- 
lecting a revised set of lists of accessible reagent 
compounds, said lists being expanded to include- 
selected reagents not present in the restricted lists 
generated In step (5) and optionally restricted to ex- 
clude selected reagents present in the restricted 



lists generated in step (5); 

(13) repeating steps (6) and (7) to identify further 
compounds which are candidates for synthesis and 
experimental evaluation for drug efficacy; 

(14) synthesising and experimentally evaluating 
said further compounds for drug efTtcacy; 

(15) if required repeating steps (11) to (14) one or 
more times; 

(16) identifying as a lead candidate a compound 
synthesized and experimentally evaluated as 
above. 

The process of the invention is characterised by the 
rapid generation of a relatively small set of readily syn- 
thesisabte candidate compourKJs with a high success 
rate in terms of drug efficacy and hence a high predictive 
value for directing subsequent iterations. 
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Description 

FIELD OF THE INVENTION 

5 This invention relates to a process for selecting lead candidate drug compounds, and in particular to such a process 

in which synthesis of candidate compounds is simplified and minimized and success rate with synthesized compounds 
is maximized. 

BACKGROUND OF THE INVENTION 

10 

Drug discovery has been a time and resource consuming exercise. Traditionally, key steps in drug discovery have 
included the identification of a compound or set of compounds having the desired drug property, the identification of 
the active structure within such compounds and the identification of a lead candidate, a compound which incorporates 
that structure and combines adequate activity with acceptable toxicity and synthetic accessibility. By acceptable syn- 

is thetic accessibility it is meant that the lead compound should be produceable via a synthetic route which is sufficiently 
straightfonvard and inexpensive that commercial production of the compound is a viable option. 

The identification of active compounds has involved screening of extensive compound libraries for the desired 
drug property. Recently, the technique known as combinatorial chemistry has offered a moderate cost route to the 
synthesis of very large compound libraries which can be screened in this manner. Although it is now increasingly being 

20 appliedio the synthesis of libraries of non-peptide organic molecules, the combir^atoi'ial chemistry technique is espe- 
cially applicable to the production of libraries of peptkJe and peptoid compounds, and synthesis and testing of such 
compound libraries can even be automated and operated under computer control. Thus for example an altemative 
approach to drug discovery using computer-controlled combinatorial chemistry is described by 3-Dlmensional Phar- 
maceuticals Inc. in WO-A-96/08781 . 

2S Unfortunately, however, the peptide and peptoid compounds for which such a combinatorial chemistry approach 

is particularly suited, due to the ease with which peptkJe molecules can be produced with a multiplicity of sequences 
on automated peptide synthesizers, often display undesirable pharmacokinetics, such as poor bioavailability 

An alternative approach to dmg discovery has also developed over recent years. This approach refen^ed to vari- 
ously as Stnjcture-Based Dmg Design (SBDD) or Computer Aided Molecular Design (CAMD) involves structural anal- 

30 ysis of the receptor site for the drug rTK>tecule and can involve computerized generation of a molecular structure which 
is capable of binding to that site, ie. a structure which has an appropriate staictural framework to fit within the receptor 
site and which is so functbnalized as to have favourable interactions with selected functkxial components of the re- 
ceptor site. 

One example of the SBDD system is the PRO.LIGAND system of Proteus Molecular Design Limited. This is 
35 described for example by Clark et al in a series of papers J. Comput.-Aided Mol. Design 9: 13-32 (1995), 9: 139-148 
(1995). 9: 213-225 (1995) and 9: 381-395 (1995). J. Med. Chem 37: 3994-4002 (1994),"and J. Chem. Inf. Comput. 
Sci. 35: 914-923 (1995). 

While highly effective. SBDD serves to generate and assess molecular structures on the basis of predicted activity 
without particular regard to synthetic accessibility. These molecules must then be made and tested and subsequent 
40 optimization to produce a lead candidate rriay require time consuming, comptrcated or expensive chemical syntheses. 

It has now been recognised that by combining certain of the features of combinatorial chemistry with certain features 
of SBDD one can produce a drug discovery system in which only a relatively limited compound library need be generated 
before a range of active compounds is kJehtified, that that range of active compounds may provkie sufficient structure- 
activity relationship information for a lead candkJate to be kientified with relatively little Iteration (ie. relatively little 
^ extension of the library that is initially generated and tested), and that the Ibrary may be generated on ratkxial principles 
ensuring that the vast majority of compounds in the library may be synthetk:alty readily accessible. 

In other words, in usBig the process of the irivehtkm to~ generate the slnicturelctryrty irifoiTnation necessary to 
identify a lead candidate one may avokJ the need to make and test the targe compound libraries required by prior art 
routine screening or by combinational chemistry and. unlike prior art SBDDIechnques, the active compounds k^entrfied 
so will implk;itfy be synthetically readily accessible. 

Thus viewed from one aspect the inventkxi provides a process for drug candkiate tdentifk>atk>n, said process 
comprising the steps of: 

(1 j obtaining a computerised representatk>n of the three-dimenskxial structure of a biriding site on the surface of 
ss a b»logk:al macromolecule; 

(2) generating a computerised model of the furKtbnal structure of said binding site which may be used to kfentify 
favourable and unfavourable interactions between the binding site and a drug candkiate molecule; 
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(3) identifying a molecular fragment (or "template" T) capable of placement within said binding site and capable 
of carrying at least one (preferably a plurality (ie. at least two) and especially preferably at least 3) substituent 
group, said molecular fragment either being capable of being synthesized from reagent compounds accessible in 
substituted form whereby to import said substituent groups on synthesis of said molecular fragment or being present 
in an accessible reagent compound capable of substitution with said substituent groups by reaction with further 
accessible reagent compounds; 

(4) generating a set oil lists of accessible reagent compounds (eg. a^-A, ag-A, aj-A, etc, bi-B, bg-B, b^-B, etc, c,- 
C, Cg-C, Ca-C, etc), the lists being such that a combination of compounds taken from each list (eg. a^-A, b^-B and 
c^^-C) may be reacted to produce a candidate compound comprising said molecular fragment carrying a plurality 
of substituent groups (eg. b3 c^^T) thereby generating a first virtual library of candidate conr^pounds being the 
theoretical set of compounds producible by reaction of the members of said lists (ie. a^biC^T, a^biCgT, a^bjCi T 
etc), each member of each list comprising a component (eg. A.B,C, etc.) common to the other members of that 
list and a component (eg. a-, , b^. c^, etc) unique within that list; 

(5) for each said list limiting the number of members thereof using a first set of exclusion rules thereby to generate 
a restricted second virtual library of candidate compounds, the operation of said first set of rules involving for each 
member of each list computerised comparison for favourable or unfavourable interactions between said compu- 
terised model and a structure comprising said molecular iragment and a substituent deriving from the unique 
component within^said list of that member, the nrxdlecular fragment and the computerized model being held in fixed 
spatial relationship to each other for satd comparison; 

(6) evaluating and ranking by computer the members of saki second virtual library for favourable and unfavourable 
interactbns with said computerised model and thereby generating a restricted third virtual library of candidate 
compounds ranked as having favourable interactions; 

(7) optbnally, selecting from sakJ third virtual library at least one further molecular fragment and repeating steps 
(4), (5) and (6) to generate an alternative third virtual library; 

(8) screening said third virtual library using a second set of exclusion rules thereby to generate a restricted fourth 
virtual library of candidate compou nds comprising compounds iwhrch are candkiates for synthesis and experimental 
evaluatbn for drug efficacy; 

(9) synthesizing some or all candidate compounds of said fourth virtual library to produce a candidate compound 
library; 

(10) experimentally evaluating the compounds of said candktete compound library for drug effkiacy; 

(11) analysing the experimental efficacy data generated in step (10) for structure-activity relationship information; 

(12) using the inforrTratkyi derived in step (11) selecting a revised set of lists of accessible reagent compounds, 
said lists being expanded to include selected reagents not present in the restricted lists generated in step (5) and 
optionally restricted to exclude selected reagents present in the restricted lists generated in step (5); 

(1 3) repeating steps (6) and (7) to klentify further compounds which are candidates for synthesis and experimental 
evaluatk>ri for drug efficacy; 

(14) syntheslsnng and experimentally evaluating saki further compounds for drug efficacy; 

(15) if required repeating steps (11 ) to (14) one or nrtore times; 

(16) kientrfytng as a lead candidate a compound synthesized and experimentally evaluated as above. 

Viewed from an altemative aspect the invention provides a method of manufacturing a drug substance, saki method 
comprising the isteps of: 

(1) obtaining a conr^uterised representatkxi of the three-dimensional structure of a binding site on the surface of 
a biobgical macromolecule; 
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(2) generating a computerised nrxxJel of the f unctbnal structure of said binding site which may be used to identify 
favourable and unfavouirable interactions between the binding site and a drug candidate nnolecule; 

(3) identifying a molecular fragment capable of placement within said binding site and capable of carrying at least 
s one substituent group, said molecular fragment either being capable of being synthesized from reagent compounds 

accessible in substituted form whereby to import said substituent groups on synthesis of said molecular fragment 
or being present in an accessible reagent compound capable of substitution with said substituent groups by reaction 
with further accessible reagent compounds; 

10 (4) generating a set of lists of accessible reagent compounds, the lists being such that a combination of compounds 

taken from each jist may be reacted to produce a candidate corripound comprising said molecular fragment carrying 
a plurality of substituent groups thereby generating a first virtual library of candidate compounds being the theo- 
retical set of compounds producible by reaction of the members of said lists, each member of each list comprising 
a component common to the other members of that list and a component unique within that list; 

IS 

(5) for each said list limiting the number of members thereof using a first set of exclusion rules thereby to generate 
a restricted second virtual library of candidate compounds, the operation of said first set of rules involving for each 
member of each list computerised comparison for favourable or unfavourable interactions between said compu- 
terised model and a structure comprising said molecular fragment and a.substituent deriving from the unique 

20 component within said list of thatmember. the rralecular fragment and the computerised model being held in fixed 

spatial relationship to each other for said comparison; 

(6) evaluating and ranking by computei- the members of said second virtual library for favourable and unfavourable 
interactions with said computerised model and thereby generating a restricted third virtual library of cartdidate 

2S compounds ranked as having favourable interactions; 

(7) optionally, selecting from said third viirtual library at least one further molecular fragment and repeating steps 

(4) , (5) and (6) to generate an alternative third virtual Ijbrary; 

30 (8) screening said third virtual library using a second set of exclusion rules thereby to generate a restricted fourth 

virtual library of candidate compounds corhprisingcompoundis whk:h are candidates for synthesis and experimental 
evaluation for drug efficacy; 

(9) synthesizing some or all candidate compounds of said fourth virtual library to produce a candidate connpound 
3S library; 

(10) experimentally evaluating the compounds of sakJ candkJate compound library for drug efficacy; 

(11) analysing the experimental efficacy data generated in step (10) for structure-activity relationship information; 

40 

(12) using the informatk)n derived in step (11) selecting a revised set of lists of accessible reagent compounds, 
said lists being expanded to include selected reagents not present in the restricted lists generated in step (5) and 
optionally restricted to exclude selected reagents present in the restricted lists generated in step (5); 

45 (13) repeating steps (6) and (7) to identify further compounds which are candidates for synthesis and experimental 

evaluation for drug efficacy; 

(14) synthesistng and experimentally evaluating sab further compounds for drug efficacy; 
so (1 5) if required repeating steps (1 1 ) to (1 4) one or more times; 

(16) tdentifytng as a lead candidate a compound synthesized and experimentally evaluated as above; 

(17) manufacturing the compound ictontified in step (16) above; and. optionally. 

55 

(1 6) admixing the compound manufactured in step (17) above with at least one pharmaceutically acceptable carrier 
or excipient. 
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By •accessible" it is meant that the reagent compounds are commercially available or are synthesizable, preferably 
via relatively simple routes, from commercially available materials, eg. using known synthetic procedures. 

By a Virtual library* is meant the set of compounds theoretically attainable by the inter-reaction of the reagents in 
the reagent lists for that library. By contrast by "compound library" or "candidate compound library" is meant a library 
5 of compounds that have been synthesized. 

In the process of the invention, the reagent lists in the first iteration can conveniently bei limited to select only one 
reagent of any group in which the unique substituents (ie. a^. ag. a3 etc) are cbsely analogous (eg. optical isomers, 
alkylene chain length homologs, equivalently substituted compounds (eg. compounds substituted by different halo 
atoms), etc). Similarly, while the reagent lists may initially include both commercially available compounds and trans- 
10 formation products of such compounds, for the first iteration (eg. for the first perfomriarKe of step (6) for a given template) 
the lists may be limited to exclude reagents which although commercially available have a long delivery time or are 
particularly expensive as well as transforlnations which although synthetically feasible are complex, have poor yields, 
or require significant purificatton. etc. In the second and further iterations however such analogs or less readily available 
compounds can be reinstated within the reagent lists where the structure-activity information derived in step (11) sug- 
IS gests that they may also lead to effective compounds. 

In this way the process of the invention may be carried out in such a way that expensive reagents or complex 
synthetic chemistry is only required at a stage where candidate compound efficacy Is already established and the drug 
discovery process is closing in on a lead candidate; 

Nonetheless, by including within the reagent lists not just compounds which are readily available commercially but 
20 also chemical transformation products of available reagents, the theoretical size of the virtual libraries is vastly expand- 
ed and the capability of the process to assess (and if necessary readily synthesize arKi test) modifications of the 
compounds revealed to be active is greatly increased. 

Computational requirements may also readily be reduced by limiting the precision of the computer evaluation and 
ranking of the virtual libraries for the initial performance of steps (5) and/or (6), eg. by limiting the conformaticrial 
2S freedom of the compound under evaluation or by specifying the position and orientation of the compound within the 
binding site model. In this way one may only bring in rT»ore complex and accurate evaluatton systems for subsequent 
iteratk>ns or for sets of library member which woukj require the use of expensive or less readily accessible reagents 
for their production. Indeed, this combination of SBDD concepts to screen a virtual library of accessible compounds 
before synthesis and experimental evaluation of the resulting limited real library may even render the use of highly 
30 computationally-demanding programs unnecessary - ie. a "quick an6 dirty" computational evaluation may be ientirely 
adequate. 

If desired, using optbnal step (7) in the process of the invention one may refine the selection of the template. Thus 
for example, step (3) of the method of the invention may be effected by selecting as the template an "anchor" group 
(Anch) suitable for binding to a sectkDn of the binding site, eg. an aryl or other lipophilic group capable of binding to a 

35 lipophllk; region of the binding site, and performing steps (4) to (6> to identify a further set of templates (T^) whk;h can 
be coupled at a substitution site to the anchor group thereby to generate a virtual library of anchor-template (Anch-T^) 
compounds, and to evaluate and rank by computer the merribers of the anchorrtemptate library for favourable and 
unfavourable interactbns with the computerised model (the Binding Site Model) whereby to produce a restricted set 
of templates ranked as having favourable interactions, one. more or all of which may sewe as the molecular f ragment 

40 for the reiteration of steps (4) to (6). In this way, by a rational selection of the template, the success rate for the sub- 
sequent synthesis and testing of the candidate compound library, and hence the predictive value of the structure-activity 
information derived ther'efrom which is the hallmark of the process of the present invention, is still further enhanced. 

Indeed the process of the Invention is characterised by the rapid generation of a relatively small set of readily 
synthesisable candidate compounds with a high success rate in terms of drug efficacy and hence a high predictive 

45 value for directing subsequent iterations. Unlike standard combihatbnal chemistry the technque is not "blind* relying 
on random success in the testing of a large library arKi unlike SBDD the techniquie inherently produces compounds 
whk:h are readily synthesizable and readily modified to home in on a lead candidate. 

While the rational selection of a template starting from a list of anchor-template candkJates as discussed above is 
a preferred aspectof the invention, template selection may be on.the basis of knowledge of the macromotecular target 

so or of knowledge of corrpounds krK>wri to bind to or othenvise influence the target. By way of example a template may 
be designed to mimic the structure and conformatk>n of a compound known to have the desired dmg effk:acy while 
itself having a structure that is accessible either by combir)atk)n of two or more reagents having the "unlque/common" 
component structure referred to atx>ve or directly as a readily substitutable reagent compound (a template reagent). 

Once the template or templates have been selected, the "common" components of the reagents are implicitly 
identified either as functional groups that will react with groups on a template reagent or as groups which will react 
together to generate the template. The reagent listsmay then be generated by searching a computer database of 
available chemicals (such as the ACD database of MDL Informations Systems Inc.). These lists may then be supple- 
mented by inclusion of accessible transformations of the compounds on the lists, eg. oxidation, reduction or substitutbn 
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products of such compounds as well as salts, esters, and other feasible chemical transformations. If desired, the mem- 
bers of the resulting lists may be grouped into groups deemed likely to have similar properties in any resulting candidate 
compounds and also ranked in terms of accessibility (ie. expense, delivery delay, complexity of any required transfor- 
mation, etc). 

s The limiting of the lists of reagent compounds in step (5) may conveniently, as discussed above, comprise restric- 

tions of groups of compounds deemed analogous, elimination of low accessibility compounds, elimination of com- 
pounds with too high a molecular weight, too targe a total atom count or with substituent groups thought to possess 
undesired properties (eg. charge or reactivity with other groups such that template production may be hindered). 
The computatkxial comparison used to further limit the lists may be effected on a list by list basis with for each list 

10 the selected members of the other lists remaining constant or preferably with the other substituent positbns on the 
template being vacant. Where the comparison does not involve such vacant sites these 'selected members' may be 
chosen on the basis of perceived compatibility with the binding site but conveniently once the first list (preferably the 
shortest) has been evaluated, a highly compatible member of that list will be the invariant selected member for eval- 
uation of the next list and soon. Advantageously, once highly compatible members of the other lists have been identified, 

IS the first list will be reevaluated with highly compatible members of the other lists being the invariant selected members. 
Alternatively and much more preferably, the computational comparison may be significantly simplified by prese- 
lection of one (or more if necessary) invariant locations and conformations for the template within the binding site 
model, followed by comparison, on a list by list basis for individual members of the lists and for an incompletely sub- 
stituted template, eg. the template carrying only the substituent(s) deriving from the individual list member being in-- 

20 vestigated. In this way, alternative orientations of a list member which^satisfies basic requirements such as appropriate 
size and functionality (eg. charge, lypophillcity, hydrogen bond dor^or/acceptor, etc.), may be scrutinised to irnprcve 
the predictive value of the ranking which is the result of the comparison. 

For step (1 ) of the process of the Invention, one can conveniently input 3-D structural informatiOT about the binding 
site (eg. X-ray crystallographic analyses) from published sources, preferably sources which are computer-accessible. 

25 The binding site model generated in step (2) conveniently consists of a representation of those regions of the 

binding site that can be considered capable of molecular interactkxi with a xenobiotic or other molecule, labelled ac- 
cording to the nature and geometry of the possible interactions, eg. hydrogen bond donor sites, hydrogen bond acceptor 
sites, aliphatic and aromatic lipophilic sites, ionic and metal-binding sites, etc. 

30 DETAILED DESCRIPTION OF THE INVENTION 

Briefly put, the process of the invention involves the folbwing steps: 

Construction of a virtual combinatorial library based around a template chemistry considered appropriate for the 
3S target molecule and amenable to combinatorial synthesis 

Screening of members ol the library based on their interaction with a target receptor 

Synthesis and testing of representative elements of the library as single compounds using a variety of synthetic 
40 protocols. 

With a known target molecule structure, this process can be cione efficiently and accurately and overconDes various 
difficulties associated with applying combinatorial chemistry or Structure-Based Drug Design. 

Firstly, the process offers all the advantages of an array based combtr^atorial library (single compounds, wider 
^ variety of chemistries) whilst sidestepping the problem of snnall library arraysi. This is because a very large virtual library 
is considered and screened computationally, leaving only a small number of compounds to be syrithesised and tested. 

Secondly, the heed to have one synthetic prbtdcb) to cover a wide variety of chenntstries is relaxed. The synthesis 
route can be tailored to acconrvnodate a larger range of chemistries than could be considered by an automated method. 
Solution and solid phase methods can be used with protection and deprotectk>n-steps as required. This means that a 
so larger virtual library can be considered and thus the chance of locating active compounds is increased. The process 
also allows for simple functional group transformations within the starting materials to increase further the diversity of 
the virtual library. 

Thirdly, by restricting the design process to molecules whch are accessible by specified synthetc routes, one 
avoids the problems often associated with raitbnai drug design, ie. ur>certain synthetic feasibility and sbw feedback 
S5 between design and experiment. The process ytekls a set of compounds based around a comnrton template which can 
be rapkjiy synthes^ed and assayed for activity against a given target. Such a set of compounds might forin an imme- 
diate QSAR (Quantitative Structure Activity Relationship) training set, in contrast to other drug discovery paradigms 
where further woric woukl be necessary to derive an equivalent QSAR set. A primary advantage over a traditional 
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medicinal chemistry approach, which would involve obtaining a lead by a screening process and then synthesising a 
large number of analogues to provide an SAR set. is thus one of cost^ffectiveness. 

In the process of the invention, each member of the virtual library consists of a common template with different 
substituents attached to it. The substltuents are derived frorn accessible chemical reagents and it is variation in these 
5 substituents at each template attachment point which causes the combinatorial explosion in the riumber of individual 
molecules in the library. The library has a synthesis route or strategy (sorrietimes refen-ed to herein as the template 
chemistry) associated with it whereby individual members are synthesised from available chemical reagents. The tem- 
plate itself can be an available chemical or can be formed during the chemical reactions (e.g. in ring forming reactions). 
The available reagents may also undergo general molecular transformations before they are attached to the template 
10 and become substituents. 

The technological aspects of the process may be separated into four stages. The design specification stage de- 
termines the constraints which are to be applied and explored during the computational screening of the virtual library 
Obviously, these constraints include the actual specification of the library, the 3D structure of the receptor and any 
specific constraints derived from the receptor to judge the quality of library members. The second stage involves se- 
lecting cherhical reagents and screening the corresponding substituents which are used to form members of the virtual 
library. The substituent screening is based on the structure of the receptor. The accepted substituents are further 
assessed and filtered using a variety of computer-aided techniques and chemical considerations. The third stage in- 
volves enumeration of the virtual library, i.e. production of the full library after all rejected substituents have been 
deleted. The final corrtputational stage of the procedure is to perform simple checks and calculations on the enumerated 
20 library and arrive at a ranking of the rnplecuies in the virtuallibrary for synthesis arid-testing. 

Each of these four stages will now be outlined in detail. 

The three aspects of design specificatbn which are discussed bek>w are template selection, template positioning, 
and the design criteria which will be discussed separately despite being inter-related. 

Specification of the design criteria involves careful study of the target macromoiecule. Thus, decisions need to be 

2S taken at this stage about which X-ray structure(s) of the receptor are to be used (if rr^ore than one is available), and 
whether some refinement by molecular dynambs/molecular mechanics needs to be carried out in order to generate a 
more accurate starting point for molecular design. Typically more than one snapshot ofthe receptor structure will be 
used in successive experiments. Also it is necessary at this stage to decide on the key functionalities in the active site 
with which the substituents on the candidate compounds are to interact. A 'design model' is then generated for each 

30 template attachment point, eg. using the Design Model Generatkxi functionality of PRO_LIGAND (see Clark et al. 
(supra)). A design model consists of a number of interactiori sites which originate from specified receptor atoms and 
may be either vectors (denoting favourable positions and directions for hydrogen bond interactk»is with the active site) 
or points (denoting positkxis of favourable lipophilic contact with the active site) (see Bohm, J. Comput. -Aided Mol. 
Design 6: 61 and 593 (1 992) and Klebe, J. Mo|. Biol. 237: 21 2 (1 994». The vectors and points are labelled to indicate 

35 the particular chemistry they represent; thus D-X and A-Y vectors represent potential hydrogen bond donor and ac- 
ceptor positions respectively. Similarly. Land R sites represent aliphatic and aromatic lipophilic sites respectively. The 
density. positk>ns and orientations of the interaction sites are encoded in a rule-base whrch can be edited by the user 
and is based on a statistical examination of experimentally preferred intermolecular contacts (see Klebe (supra)). 
The purpose of the molecular template is to hold in position the substituents which will make hydrogen bonds, 

40 lipophilic contacts or other favourable interactions with the binding site. An advantage of using structural informatbn 
in the choice of the template chemistry is that knowledge of the receptor can be used to increase the chances of the 
library containing active nnolecules. A number of important issues Ccin be identified in the selection of the ternplate 
chemistry. 

^ - The synthetic chemistry associated with the template should be relatively accessible and capable of delivering a 
wide diversity of substituents at a number of attachment points. 

Ideally, the template itself shouki be capable of making a number of favourable contacts with the receptor This 
aids in establishing the positkxi of the template and increases the likelihood that the library will contain.active 
so molecules. 

In some cases rt is possible to infer likely templates from known inhibitors or substrates. For example, in the search 
for a thrombin inhibitor a known inhibitor of thrombin is PPACK (D-phenylalanyl-prolylarginyk:hk>romethylketone) 
which contains a central proline moiety which coukJ be used as a template, or one couki choose a krrawn sub- 
5S structure with strong binding (e.g. guanidinium in the SI pocket of thrombin) whk:h can be pre-positioned and used 
to search for potential templates (ie. usuig the "anchor" technk^ue referred to above). 

Templates can be designed de novo, using structure-based techniques. This could mean using a de novo design 
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method or a receptor-basGd database screening strategy. It is also advantageous to search reaction databases 
for suitable ring forming reactions which, for example, would give rise to beta-sheet mimetics. 

It Is desirable that the template has restricted conformational freedom so that only limited numbers of alternative 
5 positions for the template need be considered. 

The process of template selection may thus involve close collaboration between modellers and synthetic chemists, 
the former providing expertise about the requirements of the templates in terms of molecular interactions at the binding 
site and the latter giving guidance concerning the synthetic feasibility of any choices made. The result of the template 

10 selection process is a set of scaffolds chosen to achieve the best architecture in the active site and to minimise the 
synthetic effort required to prepare them. In practice, the decision about which templates to pursue will be a balance 
between the variety of factors discussed above. 

It should be emphasized that unlike many of the combinatorial chemistry techniques described hitherto, it is not 
necessary that the template or the candidate compounds be peptides or peptoids. 

15 Having chosen the set of templates to be used in the active site of interest, the next task is to position the templates 

appropriately within the site. In principle, there will be a very large number of orientations of a given template in the 
site (although this number can be reduced if the chosen template makes a specific interaction with the binding site 
itself). What is required is to select a subset of these positions which place the template in such a way as to facilitate 
the molecular Interactions that will be formed by the substituents once they are attached. 
^ This placeriient process could be achieved autonr^trcatly by nieans of various objective docking prcrfocols based 

on nr>olecular mechank:s or empircally based energy cak^ulatkxis (see Blaney, Perspect. Drug Disc. Des. 2:301 (1 993)) 
or geometric positioning upon interactkm sites (see Bohm. J. Comput.-Aided Mol. Des. 8:623 (1994)). The result of 
template positioning is a positon, or number of positbns, in 3D coordinate space for each of the templates. The chosen 
orientations are saved for future reference. 

2S The process of substituent selection involves and/or other chemical compound databases a number of steps 

Searching the Available Chemicals Directory (ACD) of MDL Informatkxi Systems Inc.. San Leandro, California. 
US (and/or other chemical compound databases) to find potential substituents for a given template 

30 - Computationally screening these potential substituents, eg. using techniques such as those used in the de novo 
design program, PRO_LIG AND (see Clark et al. (supra)) 

Assessing and deciding on the preferred substituents at each positk)n. . 

35 Each of these steps is explained more fully below. It is important to realize that substituents attached to different 

attachment points are preferably tested independently of each other at this stage. This makes the process of performing 
detailed 3D chiscks on a large virtual library computationally efficient. This and other approximatkxis inherent in this 
approach are discussed below. 

Given a positioned template, it is possible to infer for each template attachment point the nature of the interaction 

40 (s) the corresponding substituent is to make with the active site (eg. hydrogen bond, lipophilic contact, etc.), the nature 
of the functional group required for a coupling reactbn to the template (eg. acid chtoride with a primary amine) and a 
distance range between the point of attachment to the template and the point of interactk)n with the active site! 

These two (or mofe) substructural criteria with the associated distance range(s> constitute a viable query for a 3D 
database search using database searching tools, such as ISIS/3D from MDL lntormatk)n Systems Inc., Unity from 

45 Tripos Associates Inc.. and Chenrv3D from Chemical Design Ltd., etc. The query can be made more sophisticated 
through a consideraiksn of potential nrK>lecular transformations, or through the imposition of synthetic constraints on 
alk)wed chemistries in specified substructures. By using the ACD, the chance that all chosen substituents will be com- 
mercially available is maximised. In general, the search carried out shouki expbre the cbnformationat flexibility of the 
database molecules to ensure that as many as possible of the potential substituents at each position will be retrieved. 

50 For each template attachment point, a file of potential substituents may be saved as 2D structures to a file, eg. in 

MDL's SD format (see Daiby. J. Chem. Inf Comput. Sci. 32: 244 (1 992)) and then the Converter program (available 
from MSI. San Delgo. California, US) may be used to add the necessary hydrogen atoms and generate 3Dcoordtnates 
for the structures. 

The methods used for the computatbnal screening oi potential substituents may conveniently be techniques such 
55 as those used in the de novo design package PRO.UGAND (see Clark et a| (supra)). As described earlier, each 
template attachment site has its own design model and the template attachment sites thennselves are appended to 
the design models, according to the labels specified in the template file which is input to the program. By autonnatk^ally 
labelling the potential substituents for each template attachment positron with appropriate interaction link sites, it is 
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possible to use rapid algorithms to establish whether they can form good molecular interactions with the active site. 
For more details, see Clark, J. Comput. -Aided Mol. Des. 9:13 (1995) and Murray, J. Comput. -Aided Mol. Des. 9:381 
(1995). 

The flexibility of this approach is enhanced by the ability to detect specified functionaf groups and replace them 
5 with another group. This increases the diversity in the virtual library that is computationally screened and so increases 
the chance of finding active compounds. The computatbnal deprotection of protected functional groups is one example 
of how this feature might be used. 

Themolecular transformation may be controlled by rules containing a SMILES-like notation (see Weininger, J. J. 
Chem. Inf. Comput Sci. 28:31 (1988)) for the substructures together with a number of integers. Thus, for instance. 
10 one rule may indicate that up to three silyl ethers are to be replaced by hydroxyl groups in any molecule. The geometry 
for the transfomned pari of the nrrolecule is rebuilt atom by atom using a rule-based procedure and then relaxed with a 
molecular mechanics minimisation. 

A simitar approach is used to protonate or deprotonate certain functional groups specified by the user in order that 
the molecules to be placed in the active site have realistic protonation patterns. Once the molecules have been sub- 
's jected to ail the transformations requested by the user, they are passed on to the initial molecular property screens. 

Before subjecting the potential substituents to more computationally demanding subgraph isomorphism and di- 
rected tweak checks, some rapid molecular property screens may be used to eliminate unsuitable structures. Thus, 
acceptable ranges may be set for a number of properties, eg. 

^ - Molecular weight 

Number of atoms 

- Log P (eg. calculated using the method of Viswanadhan, J. Chem. Inf. Comput. Sci. ^:163 (1989)) 

2S 

Number of rotatable bonds 

Any substituents which fall outside the acceptable ranges may thus be automatically rejected. This is useful, for 
example, when the database entry contains more than one component. The program should automatically separate 

30 the components and treat each one as separate substituent. The screen based on the number of atoms tends to remove 
the undesirable component which is often ai counterion. The code can also screen out duplicates. 

A further initial screen on substituents may be emptoyed for some complex template chemisties. Thus for example, 
if in a ring forming reaction one chemcal reagent gives rise to two substituents on the template, then the two corre- 
sponding template attachment points will have the same list of available chemicals associated with them. Specific 

3S checks shouW ensure that only chemical reagents which have provided a substituent to pass all screens for the first 
template attachment point are considered for the provision of substituents for the second template attachment site. 

The first step in subgraph isonrKjrphism matching process is to label the potential substituent with the appropriate 
interaction sites. This is accomplished by means of a rule-based procedure where each rule denotes a substructure 
in the SMI t^ES-like notation mentioned earlier and indicates if and how each of the atoms in that substnicture shouki 

40 be labelled. Thus for example, one rule might instruct the program to search the substituent for any matches to a 
partfcular specified substnjcture (eg. C(=NH)N(H)H) and to label the second and fourth atoms of the match as X sites 
and the third and fifth atoms of the match as D sites. A powerful regular expression-t>ased syntax is available within 
the SMILES-like notatiwi whch permits very flexible definitfons of the rules; for instance, a further rule might indicate 
that any OH or NH group attached to a cartx^n atom should be labelled as a donor group. 

4S In addition to the interaction sites described eariier, it is also desirable to label each substituent with link sites. 

These denote the vector site in the structure where the potential substituent will join to the template. The link sites are 
assigned iff an identical nr^her to the interactbh sites." Thus, for instance, a f urt^ier hile might instruct the program to 
label the C^ bond in a CCO2H substructure as a link site (link site vectors are denoted V-W). (Note however that a 
link site does not have to correspond to an attachment point in ah actual chemical reaction; for example, the formation 

so of a peptide bond may be the chemtcat reactbn associated with substituent attachment, but by defining the template 
to already contain the peptide bond, the C-C bond can be used as the computational link site. The chosen definitk^n 
is dictated by convenience or conrtputatkxial efficierK^. although in templates derived from ring forming reactions, it is 
often essential to choose link sites whk:h do not correspond to the bond formed in the chemk^al reactbn.) 

If. for any reason, a potential substituent cannot be assigned either interaction sites or link sites, it is automatk:ally 

^ rejected. Otherwise, the program shouki proceed to seeka 3D match between the interaction/link sites of the substituent 
and the interaction/link sites of the design model. This may be accomplished using the subgraph isomorphism algorithm 
of Ullmann (see J. ACM 23:31 (1976)) which has been used successfully in many chemical structure applications, in 
order to account for the conformatbnal flexibility of the substituents in this process, distance bounds matrices are 
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calculated using the directed tweak routines which seek to establish the maximum and minirnum distances that can 
be attained between all pairs of atoms through rotation about rotatable bonds (see Murray, J. Comput.-Aided Mol. Des. 
9:381 (1995)): The subgraph isomorphism algorithm then uses these distance ranges in establishing a match in the 
manner described by Clark. J. Mol Graphics 10:194 (1992). 
5 If no match is found for the substituent, it is rejected and the algorithm returns to consider the next available 

substituent. 

The finding of a match for a substituent in the subgraph isomorphism check is not necessarily a sufficient condition 
for a substituent to be accepted. This is because the distance bounds matrix does not include correlation effects, i.e. 
the effect that one interatomic distance having one value might have on the possible values attainable by the other 
10 interatomic distances. Thus, in order to establish whether the substituent is in fact a viable one for the template attach- 
ment point in question, a specific matching conformation should be gerierated using some form of conformational 
exploration procedure (see Clark, J. Mol. Graphics JO: 194 (1992)). 

The procedure adopted for the experimental trials reported bebw is based on the directed tweak algorithm (see 
Hurst. J; Chem. Inf. Comput. Sci. 34:190 (1994)) which was originally developed for 3D database searching applica- 
'5 tions, where it has been shown to be both efficient and effective. Its utility in the fieW of de novo design has recently 
been demonstrated (see Murray, J. Comput.-Aided Mol. Des. 9:381 (1995)). 

The directed tweak algorithm takes the match established by the subgraph isomorphism algorithm and then seeks 
to verify it by performing a torsional optimisation of the rotatable bonds in the substituent. After a potential match has 
been located, the substituent is attached to the template. The bond where attachment occurs is treated as rotatable. 
20 The following cost function is minimised by a steepest descent method: 

1 

F = S a^di" 



where the summatbn occurs over all N interaction sites, and dj is the distance between the ith substituent interactbn 
site and the design riiodel interactk>n site with which it is matched, aj rs a coefficient whfeh depends upon the type of 
Interactktn site being matched and is a simple function of the tolerar)ces used in the subgraph isomorphism algorithm. 

50 This cost functkxi differs from that used by Hurst (J. Ghem. Inf Comput Sci. 34: 1 90 (1 994)) and Murray (supra), in that 
the distances between pairs of sites are hot included, only the absolute distance between the two matched sites. This 
is possible because the template attachment site provides a fixed point of reference in the design model coordinate 
space. This means that there are fewer terms in the cost f unctron expresston and it is likely that the simpler expression 
has fewer kx:al minirna. There is also no need to check the chirality of the conformations produced. These advantages 

3S make the approach considerably faster 

After minimisation, the conformation is accepted if it passes the following criteria. The value of the cost function 
must be less than a user defined nnaximum (typk:alty about 0.5 A^, and the substituent must not be clashing with the 
receptor, with the template or with itself. If the conformatk)n fails these checks, the tweak routines are used to find an 
alternative conf omoatbn - the procedure is repeated until an acceptable geometry is located or a user definable number 

40 of attempts has been exceeded (see Murray (supra)). 

The substituent still attached to the template is then optkxially minimised using a molecular mechank» energy 
function. This is done in the presence of the receptor (whrch is treated as rigid) and a cut-off on the tong range terms 
of Qk is usually applied. An estimate of the strain energy in the receptor-bound conformatk>n is obtained by performing 
the minimisatbn (starting from the tweak-generated geometry) in the absence of the receptor and subtracting from this 

45 energy, the intranrKrfecular energy of the receptor-bound conformatfon. During these calculations, the template part of 
the molecule is heki rigid. Alt molecular mechanics cak:ulations may be done empbying the fast and approxinrtate 
•Glean* fbrcefieWdwetoped'by Hahh (J! Med Chem 38:2080 (1995))rParti^^ using the method 

of Gasteiger and Marsili (see Tetrahedron ^:3219 (1980)). The Clean forcefield bears many similarities to the 'gen- 
eralised atom' forcefiek5 incorporated in the Chem-X software (available from ChemicaHDesign Ltd, Chipping Norton, 

so UK) in that it does not rely on extended forcefield atom types. Only element type, hybridisation and bond type are used 
in cateulating the energy of a system (see Hahn (supra)), A number of minor adjustments may be made in the imple- 
mentation of the forcefiefcJ. The first is that all hydrogen atoms are ueated specifically and are assigned an sp^ atom 
type. The second is that van der Viteals* radii for potential hydrogen-tjond-forming atom pairs are scaled, typically by 
0.8. It shouki be realised that the purpose of the Clean forcefield iri the process of the invention is to provide a rough 

ss clean up of the substituents. which may possess distorted geometries caused by unrealistic torsion angles. The force- 
field must be robust, in the sense that it must be able to cope with any chemistries that are given to it, and this is why 
a generalised atom forcefield is the nrtost obvious choice. Addittohalty it must meet the approximate accuracy criteria, 
and in this context, the accuracy of the intermolecular terms is important. It was after analysis of intermolecular ge- 
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ometries obtained using Clean that the scaling of the hydrogen bonding van der Waals' radii was introduced. It is 
believed that the forcefield produces improved and reasonable geometries - at least when some portion of the molecule 
(here the template) is held fixed in the receptor. 

The minimised conformation of the substituent (still attached to the template) is then assigned a score using a 
5 scoring function developed by Bohm for use in the de novo design program LUDI (see J. Comput -Aided Mol. Des. 8: 
243 (1994)). Bohm's scoring function permits an approximate calculation of the binding free energy of the substituent 
and template in temis of readily calculable quantities such as lipophilic contact surface area, the number and quality 
of hydrogen bonds fonmed and the number of rotatable bonds. Following Bohm, the form of the equaton used Is 

hbonds 

(AR. Aa) + AG^„j^2:j^j,f(Aa Aa) 

IS where 

f(AR,Aa) = f1(AR)f2(Aa) 



20 and 



25 



30 



35 



(1 AR ^ 0 . 2A 

fl{AR) = ( 1 - (AR-0.2)/0.4 AR ^ O.sA 
(0 AR > O.SA 



and 



(1 Acx < 30** 

f2{Aa) = ( 1 - (Aa-30)/50 Aa ^ 80' 
(0 Aa > 80° 



f (AR, Aa) is a function which penalises hydrogen bonds whose geometry deviates from ideality. AR is the deviation of 
the H ... O/N hydroger) bond length from 1 .9A;. Aa is the deviation of the hydrogen bond angle N/OH... 0/N from its 
ideal value of 180**. AGg Is a contribution to biriding energy which Is independent of Interactions with the protein. Bohm 
suggests that this rinay be rationalised asi a reduction in binding energy due to loss of translational and rotational entropy 

40 of the ligand. AG^b describes; the contribution from an Ideal hydrogen bond and AGi^nic the contribution from an unper- 
turbed ionic Interaction. AGgp^ denotes the contribution from lipophilic interactions which Is assunr>ed to be proportional 
to the lipophilic contkct surface between ligand and protein, A,^. Finally, AG^t describes the loss of binding energy 
due to the freezing of Internal degrees of freedom in the ligand. N„| is the number of acyclic sp^-sp^ and sp^-sp^ bonds 
excluding the rctatk>ris of terminal methyl aiid amine groups, 

45 The values used for the various coefTtcients are those adopted by Bohm for the LUDI program: AGq s 5.4. AG^b 

= -4.7. AGi(^ig--8,3. AG|pp=-0.17andAGp5j= 1.4^ Thecc^fficiems werec*tainedbyjlttingtheequat 
for ligand-receptor bindrrig where crystallographic structures for the complexes were available (although a few ge- 
ometries were obtained by docking the ligands into the receptor). The accuracy of the function is riot expected to be 
better than 1 .5 orders of magnitude In the binding affinity, 

so Using this scoring function, it is possible to rank the accepted substituents according to the strength of interaction 

they are likely to make with the receptor by subtracting the pre-computed score for the template from the total score 
for the template-substituent cornblnatkxi. 

Since the first-found conformation is not necessarily the highest scoring one available to the substituent, a user- 
specified nurrtbfer of acceptable conformatkxis (typically 1 0 or nrxwe) will be sought and scored. After these conforma- 

ss tions have been examined, the substituent geometry with the highest score is saved for future reference. 

Once potential substituents have been located for each template attachnnent point, one can automatk:ally enu- 
merate the possibilities to produce the full library for all combinations of those substituents and the template. However, 
it is usually advisable to consider the substituent lists further so as to reduce the size of the enumerated library. 
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The output thus far for each template attachment point is a directory of substituent files each containing scoring 
and database informatbn. A directory of substituents, the receptor structure and the template molecule are read in to 
a graphical visualisation package. This package may be designed to allow the user to scroll quickly through the sub- 
stituent list in any order whilst displaying the file name and any molecular properties that are preserit in the substituent 

5 files (e.g the strain energy, the Bohm score or the components of the score). The properties may be displayed in a 
spreadsheet njnning alongside the molecular visualisation, Substituents can be visualised in isolation, with the template 
or with the receptor structure. 

A set of substituent structures is treated as a list on which operations can be performed by the user For instance 
one would probably want to store all structures with Bohm scores less than a given value in a new list of 'Good Scores*; 

10 one might also want to exclude all structures with high strain energies, and possibly renKive bad structures judged by 
more subjective criteria (e.g. bad chemistries or geometries). The user can have full control over which list of structures 
are displayed. At any time the user can write a list to a new or old directory or remove a list from an oki directory. 

Coupled to the list functbnality is a clustering facility which alk>ws one to duster a specified list on the basis of 2D 
chemk:al functionality. The clustering may be based on similar functionality available in PRO_LIG AND which measures 

15 similarity by Tanimoto coefficients derived from bit string representations of the chemical structures (see Wiilett. J. 
Chem. Inf. Comput. Sci. 26:109 (1986) and Barnard, J. Chem. Inf. Comput. Sci. 32:644 (1992)). The bit strings may 
be specified by 172 atom-centred fragments generated from an analysis of 5000 structures in the CambrkJge Structural 
Database (see Allen. J. Chem. Inf. Comput. Sci. 31:187 (1991)). Several different clustering algorithms are available, 
and one may use a hierarchical clustering method such as Complete Linkage or Wartfs. (The number of structures 

20 clustered nrtay typically be about 100 or less, so CPU time is not an issue.) A number of tools are available to help 
decide on the appropriate number of clusters for the specified lists. The output from the clustering is a new set of lists 
each containing an individual cluster. These can be bro\A«ed and operated on as descrtoed above. Whilst the clustering 
is not always perfectly in line with chemical intuition, it is an extremely useful way of navigating through and keeping 
track of a fairly large number of substituents. 

2$ The fir>al facility provided by the molecular browser is to rescore a list of substituents using the empirical Bohm 

score. Rescoring in this way is practical because tens of structures can be scored per second and is useful because 
information ^ined during the scoring can be used to provkJe a graphical representation of the score. Hydrogen bonds 
or ionk: interactions are kx:ated, marked arui annotated with the contributbn they make to the predicted binding affinity. 
This saves a k>t of time in deckiing which hydrogen bonds are fdnned and how good they are. It also points out hydrogen 

30 bonds whichmay be contributing to the score in an unrealistic way Bonds that are consk^ered rotatable are also marked 
so that the user can see which bonds are (or are not) contributing to the score. Finally, the grkl used to establish the 
lipophtlk; contributbn to the score is displayed graphk^alty. Relevant grd points fall into several categories: 

tipophilb ligand atom in contact with lipophilic receptor atom 

3S 

lipophilk; ligand atom in contact with polar receptor atom (or vk:e versa) 

polar ligand atom in contact with polar receptor atom - lipophilk: ligand atom in contact with nothing (i.e. solvent) 
40 - polar ligand atom in contact with nothing (i.e. solvent) 
volume of ligand 

The user can colour each of these grid point types, though in practice, we have terKied to use colours for the first 
^ three types only. The visualisation is useful because it displays aspects of ligand-receptor contact which are often 
difficult to assess quickly from kx>king at the comjplex alorie. 

After appltciatkin of these tools a Waller set of substituents is deckled on for each of the teifiptate attachment 
points. The aspects which are consktered in producing this list are: 

2D diversity Using the clustering tools ax^ chemk:al knowledge, a diverse set of substituents may be chosen. For- 
50 example, tf there are 10 fluorinated derivatives of phenylalanine only one need be chosen. Expk>ratK»n of different 
chemistries is important because the scoring functions can only be expected to deliver approximate accuracy in the 
prediction ol the binding affinity. 

3D contacts It is important to kx>k at the contacts a substituent is predk:1ed to make with the receptor and to form 
a judgement as to whether these seenrt reasonable or not. In partk:ular, substituents whch have a large amount ol 
5S polar-noripolar contact are suspect. There shouk) also be an awareness of 3D diversity and there shouM be an attempt 
to target molecules which explore different forms of receptor contact to make up for defrciencies in the scoring criteria 

synthetic considerations There should be a consideratkxi of synthetk: feasibility. Although the strategy of making 
single compounds by the most appropriate protocol means thiat a larger diversity of substituents are synthetk:ally ac- 
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cessible, there will still be some substituents which contain functionalities that are difficult to incorporate in any synthetic 
protocol. Additionally, where one compound is to be chosen from several similar possibilities, choices could be made 
on the basis of ease of availability or price of the compounds. 

scores The scores of the substituents (e.g. Bohm scores, forcefield energies, etc.) can be used to choose preferred 
5 substituents from among lists of similar compounds. 

The process of combinatorial enumeration simply involves forming a list of all the remaining substituents at each 
R-group position and then creating all possible combinations of them. Thus, given a template with three R-group po- 
sitions and three substituents for each, the combinatorial enumeration procedure will produce 27 different molecules. 
The geometries produced are based on the highest scoring geometries of the corresponding substituents. The resulting 
10 molecules are stored for further analysis or transfer to a 3D database. 

The resulting molecules can be. but are not usually, minimised with the Clean forcefield and are then rescored in 
the same manner as the substituents. Es;timated logP and molecular weight are also routinely calculated for the com- 
plete molecules. 

In our applications, the complete molecules have also been subjected to evaluation using the CFF95 forcefield in 
15 Discover (available from Molecular Simulations Inc. San Diego, California, US). Simplified cut down models of the 
receptor are used and minimisation and rrrolecular dynamics are used to assess the quality of the designs. If the designs 
are reasonably stable during dynamics and possess high scoring snapshots then they are considered suitable for 
synthesis. 

The final decision about which molecules to synthesise is made by considering all the data collected, for the sub- 
20 stituents and enumerated molecules. The full library could be synthesised, or selected molecules can be chosen from 
the full tibraiy. The possibility of experimental design to choose the best candidates has been explored. Thie method 
initially used was D-optimal design which attempts to maximise the coverage erf a specified property space in a subset 
of molecules chosen from a larger library. In our explorations, the spread in the following properties was approximately 
maximised: 

2S 

the substituents from which each library member was derived 

estimated value of logP for each library member 

30 - the hydrogen bond, the rotatable bond and lipophilic contributions to the Bohm score for each library member 

Several constraints can be imposed on the design such as inclusion or exclusion of compounds which are outside 
a specified range of amolecular property. The general conclusion of this application of experimental desigri was that 
although it was useful, practical considerations, such as ease of synthesis of particular classes of compounds from 

3S the full library, were usually more important. 

Most of the computationally intensive routines of the operating software for the process of the invention may be 
written in Fortran; the data structure and data handling code in C. and the drivers and user interface parts in Global. 
Global is a proprietary interpreted language designed for application to computer-aided molecular design. The main 
use of Global is that, together with the chemical utilitieis and their associated data structure routines, it provides a 

40 flexible environment for the operation of the process of the invention, A language which allows high order chemical 
design features and user input to be expressed succinctly and naturally makes the methods easy to prograra amend 
and debug. Global also makes mundane tasks such as lO and memory manageittent straightfonvard, and frees the 
programmer to concentrate on the chemical design aspects of a programming task. Because there is no compilation 
for the interpreted language it is easy to adaipt the driyers and run them tnteractivefy or in batch mode. The user can 

^ either treat the GLOBAL files as input decks in the traditional sense or. if they have more confidence/can make fairly 
significant changes to the order of operaton of the drivers; introducjng different screens for the substituents as they 
see fit. Higher level languages have shown their worth beifore in CAMD applk:atk)n3 as illustrated by TRIPOS's SPL 
language or the various languages offered to MSI users. 

' in the method of the invention, the drug compourKJ may if desired be formulated for adrninistratkxi, eg. via parenteral 

50 or enteral routes, for example orally, rectally, nasally^ transdermalty, by iniectkm or infusion, or into the lungs. Typical 
administration forms include tablets, powders, capsules, suppositories, synjps, sprays, solutioris, dtspersk}ns, suspen- 
sions, emulsions and gels. Such compositkxis may contain conventional pharmaceutically acceptable carriers and 
excipients, eg. watei'for injectkxis, physblogtcal salir>e, biiffers, sweeteners, dispersants. bulking agents, etc. 

55 EXAMPLE 

The generation of a Library of thrombin inhibitors is described as an example of the present invention. 
Thrombin is a trypsin^ike serine protease recognised as a key enzyme within the coagulation cascade. Its primary 
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action is catalysis of the conversion of soluble fibrinogen to insoluble fibrin, which is the basis of thrombus formation 
and blood clotting. In addition, thrombin has several other roles in the control of pro- and anti-coagulant pathways in 
the coagulation cascade, inducing platelet aggregation, and more general signalling roles via activation of a thrombin 
receptor. As thrombin is the final step In both the extrinsic and intrinsic clotting cascades, it has attracted much attention 
5 as a therapeutic target. Modulation of thrombin activity may be of use to prevent Inappropriate thrombus formation, for 
example as a general anticoagulant as an adjunct to surgery or as a prophylaxis in various cardiovascular disorders 
such as myocardial infarction and unstable angina. Direct competitive inhibition of thrombin has been pursued by 
several pharmaceutical companies In an effort to obtain a new class of anticoagulants and antithrombotics, potentially 
with good oral bioavailability and improved efficacy and toxicity when compared with existing drugs. 

10 The present example relates to the design of a library of novel thrombin inhibitors which are potential drug candi- 

dates. At this stage the quality of the designs was assessed in terms of an in vitro assay of thrombin inhibition. Suc- 
cessful designs may be selected on the basis of the measured inhibition constant (Kj). In addition, the selectivity of the 
compounds towards thrombin may be assessed by performing enzyme inhibitbn assays versus structurally-related 
serine proteases, such as trypsin and Factor Xa. In general, enzyme specificity is an important consideration because 

^5 an intended thrombin Inhibitor may also inhibit fibrinolytic enzymes and hence exert an undesired thrombotic effect. 

The first stage in the application of the process was the identification of an appropriate template structure and an 
associated synthetic strategy. This was achieved by analysis of known thrombin inhibitors in order to identify chemical 
moieties which appear to contribute favorably to binding. The source of the data was the Brookhaven protein database. 
From the available th rombin-iigand complexes It was decided to select as a template the proline moiety from the inhibitor 

^ PPACK (D-phenylalanyl-prolyl-arginyl-chloromethylketone). 

This was chosen for several reasons. Previous analysis of SAR data for thrombin had highlighted PPACK as an 
effective inhibitor bound at the active site (that Is the site at which the catalytic hydrolysis of the peptide substrate 
occurs). The activity of PPACK was believed to be the result of making favorable interactions with several distinct 
regions of the active site, most Importantly two hydrophobic pockets (labelled as distal and proximal to the catalytic 

2S amino-acid residues) and a polar pocket (the arglnine-binding pocket or In enzyme terminology the S1 pocket). Proline 
was considered a good choice for a template because it fulfilled the design criteria that it should make some favorable 
Interactions In itself, and also allow the positioning of substituents which will also make favorable interactkxis. Analysis 
of the X-ray structure revealed that proline makes good interactions with the proximal hydrophobe pocket and allows 
the positioning of a potential library of substituents which are likely to make good Interactions with the remaining two 

30 pockets. In the absence of an X-ray structure these assumplbns would have to be made on the basis of modelling the 
enzyme-template complex. 

The structure of PPACK and some of Its key interactions with thrombin are shown In Figure 1 . It was the intention 
to design a library of reversible inhibitors exploring a diverse set of substituents in the D-Phe and Arg positions. 
Several sets of substituent lists weire prepared using different design criteria. Initially, the N-termtnus on proline 

3S was targeted with starting reagents thiat possessed a carboxylic acid (to form a peptide bond with the template), and 
a hydrogen bond donor plus a hydrophobk: group (to form contacts with the D pocket). This was later augmented by 
a list of sulphonic acids and sulphonyl chlorides (to form a sulphonamkie bond with the template). The C-terminus was 
initially targeted with starting reagents that possessed bis-arhines (to form a peptkie bond with the template and hy- 
drogen bonds In the S1 pocket). This was augmented by a list of amines with aronDatic nitro compounds used as 

40 'protected* anilines; and a list in which amines were "protected* as nitrile compounds. In all cases, there were 2D and 
3D constraints irtiposed when searching through the ACD. Two posltionlngs of the template in an associated receptor 
conformatbn were used. The first was derived directly from the proline positron in the crystal structure of the covalently 
bound PPACK, and the second was derived frc»m a computatior^l sImulatkxY of a non^covalently bound anabgue of 
PPACK. 

Table 1 gives some details of the numbers of compounds considered at each stage of the process for the second 
template ppsrtbn (the resuKs from the fi^^ are very similar). For the sake of clarity the results of only 

one substituent list at each template attachment point are given. The 2D search was not fully refined and includes 
many reagents which are not practicable with any slnrtple synthetic route. It also includes many substituents whbh 
woukl be ruled out of a single protocol combinatorial approach yet have been successfully included in our final com- 
50 pound set. It is clear that even after a thorough application of 3D database searching, the virtual library size is still 
enormous and receptor screenhg and scoring are required to reduce it to a manageable number. 
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Table 1: 



statistics for number of molecules considered at each stage. The list of substituents for the Arg position were 


primary/secondary amines (plus hydrazines) for the 2D search, and bis-amines separated by 5-8A for the 3D search. 


For the Phe position the list was carboxylic acids for the 2D search, and hydrophobic carboxylic acids with a donor 


group 2-3A away for the 3D search. 






Stages 


No. of accepted substituents 


No. of compounds in 








\/irtiol f^f^mKiniatrtrial 




Arg position 


Phe position 


Library 


After 2D ACD screen 


4262 


8803 


37518386 


After 3D ACD screen 


^ 894 


437 


390678 


After receptor screen 


144 


145 


20880 


After binding affinity 


65 


81 


5265 


screen 








After strain energy screen 


53 


71 


3763 


Selected synthesis 


9 


8 


72 


candidates 









The resulting substituent lists were then nrtore thoroughly evaluated using: 2D chemical diversity; visuaiisation of 
the 3D contacts made by the substituents (those which interacted with different parts of the receptor were especially 
targeted); further computational evaluation of the predicted binding affinities, interaction energies and physical prop- 
erties of the substituents and their enumerated counterparts; and further consideration of synthetic feasibility. 

Figures 2 and 3 give examples of the starting materials used in the synthesis of library members based around 
the proline template. Not all possible members of the library were enumerated. All substituents at the Phe position 
were enumerated with the prolinyl-agmatine moiety (i.e. representing a good Arg position substituent) and all substit- 
uents at the Arg position were coupled to the D-phenylalaninylrproline moiety (i.e. representing a good Phe position 
substituerit). In addition a full array involving reagents A1 -A4 and B1 -64 was synthesised. The basic synthetic protocols 
were modified as necessary to take account of the wide diversity of functionality in the starting materials. In particular, 
a solid phase approach was used with the bis-amines (B2, B3, 85; 88 and 89) and solution methods were used for 
the others. The nitrile compounds were reduced In advance of coupling. All solution phase routes proceeded via cou- 
pling of the activated acids to the proline benzyl ester which allows for deprotection via hydrolysis or hydrogenation 
dependent upon substKuents within the acid. 

The D-amino acid anabgues of array A had free amino groups protected with Boc. Where B was a symmetrical 
bis-amine, the amine was attached to acid-labile chtorotrityl resin and the resin washed with dichloromethane and DMF 
Fmoc proline was attached using TBTU/DI PEA activation (2 eq). After deprotection of the amino group with 20% pip- 
eridine in DMF, the Boc protected A component was coupled as before. The product was cleaved from the resin with 
10% TES in TFA (30 min), evaporated to dryriess and triturated with diethyl ester to give the crude product. Where B 
was an asymmetric amine, the Boc protected A component was coupled to proline benzyl ester (1 eq) by activation 
with TBTU/TEA. Hydrolysis of the benzyl ester using NaOH (1.1 eq) in acetone/water (1:1) yielded dipeptide acid which 
was pre-activated (TBTU/TEA) and reacted with the amine (1.1 eq) in DMF (or DMF/water (1:1) for water soluble 8 
components). Extraction of the product followed by deprotection (5% aq TFA>. evaporation and trituration with diethyl 
ether, yielded the crude product. 

The other members of the A array (sulphonic acids, sulphonyl chlorides and a-hydroxy acki) were attached without 
further protection. Sulphonyl chlorides were reacted with proline benzyl ester in the preserve of TEA (2 eq)^The ester 
product was hydrolysed as above aruj 81 coupled via TBTU/TEA pre-acUvatlon as above. The product was extracted 
with metharwl after evaporation of the reaction mixture to dryness. 

The 'protected* 8 compounds were coupled as described at>ove for asyrrvrtetric amine 8 compounds after appro- 
priate reduction: H2. Pd/C in the case of nitro groups, catalytic transfer hydrogenation (hydrazine hydrate, ethanol, P6i 
C , 85*C) for B1 2, iar>d LAH for the remaining nitrile compounds. 

The compounds were tested for inhibition of thrombin and trypsin using a colorimetric microptate assay with syn- 
thetic peptide substrates as described by Tapparelli. J. Biol. Cherti. 268:4734 (1993). In general the compounds were 
tested as crude products and the more active compounds were purified, and accurate were experimentally deter- 
mined. The results are given in Table 2. and show that almost all of the compounds were active agairist the two enzymes 
with several compounds showing selectivity for thrombin. The most active compound (A3B1 ) has a>^ of 41 nanonrx)lar. 
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Table 2: 



Inhibition results for molecules synthesised. The Kj values are in micromolar Where values are in parentheses, 


only the crude compounds were tested. Generally, activity increased by between 3 and 10 times when pure samples 


were used. Where no value is given, the molecules were not active. The errors in the calculation of the K: for purified 


compounds are less than 10%. 










Compound 


Thrombin 


Trypsin 


Compound 


Thrombin 


Trypsin 


Aid 1 


U.5o 




Ml 0£. 




(0.6) 


A1B3 


(30) 


(10) 


Ml p4 


(100) 


(-) 










0.83 


0.19 


A2B3 


2.8 


8.8 


A2B4 






A3B1 


0.041 


0.60 


A3B2 


0.30 


0,95 


A3B3 


1.3 


58 


A3B4 


(30) 


(-) 


A4B1 


1.4 


1.9 


A4B2 


(22) 


(0.6) 


A4B3 


(50) 


(13) 


A4B4 


(200) 


(•) 


A2B5 


(10) 


(40) 


A2B6 


(90) 


(-) 


A2B7 


0.71 


58 


A2B8 


(20) 


(6) 


A2B9 


0.69 


1.5 


A2B10 


4.0 


590 


A2B11 






•A2B12 






A2B13 






A2B14 


48 . 




A2B15 












A5B1 ■ 


2:9 


0.12 


A7B1 


0.28 


1.0 


A851 


1.6 


0.67 


A9B1 


0.53 


0.52 


A10B1 


(-) 


(9) 


A12B1 


(-) 


(1) 


A13B1 


(200) 


(90) 









At the Phe position, the best scoring substituents were aromatic D-amino acids, which reflects the strict 3D con- 
straints imposed by the thrombin active site and the need to form good hydrophobic contacts if high affinity is to be 
achieved. The process of the invention did produce non-amino acid solutions but these scored poorly The best sub- 
stituent was the p-Br-D-Phe (A3) which is three times more active than the simple Phe derivative. TTie available starting 
material was a riacemic mixture and the resulting diastereoisomers were separated by HPLC. As predicted, one of the 
diastereoisomers was at least 1 00 times less active than the other. Of particular interest are the substituents with polar 
functionality (A4, A5 and A7) which have riot been thoroughly explored before in PPACK analogues: These were se- 
lected because they were predicted to form additional hydrogeri bonds, which if not contributing to affinity, could en- 
hance selectivity. The poor activity of the sulphonarnkJe derivatives against thrombin was not particularly surprising 
siince the design criteria for this substituent list omitted a hydrogen bond to Gly-216. Despite this drawback* the syn- 
theses were justified because the sulphohamides increased the chemical diversity and allowed the exploration of dif- 
ferent modes for the hydrophobic pocket. 

At the Arg position, the most active base is agmatine, as expected, since the guanidino group can inake excellent 
contacts with Asp-189 and Gly-218 at the bottom of the SI subsite. However there is great incentive to diverge from 
the arginine-like chemistry because of its pharmacokinetic properties and skie-effect profile. Dt-amino pentanie (B9) is 
active, as woukl be expected for a lysine analogue (see Brady, Bio Med. Chem 8:1 063 (1995)). The other bis-amines 
also have respectable activity, whk:h is of Interest because good hydrophobic contacts in this pocketmay iricrease 
affinity and selectivity (see Deadman. J. Med Chem 38:1 511 (1995)). The activity of the short aniline (B7) is parttoularly 
interesting. It is unlikely that this substituent is long enough to interact directly with Asp*1 89 (although there could be 
a mediating water molecule), instead it is predkned to form hydrogen bonds to Gty-21 9 and Ata-1 90. It was the activity 
of this compound which caused us to exptore different anilines using the functional group transformatbn strategy. 

Viewed from a second aspect the invention also provkies novel active connpounds identified by the process of this 
invention. Thus the compounds tor whch non-parenthesized activity values are given in Table 2 above are deenr>ed to 
fall within the scope of the inventkm as are all other active PPACK anak>9S incorporating the 'successful* substituents 
that characterise reagents A3. A4, A5, A7 and B7. 
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Cjalins 

1. A process for drug candidate identification, said process comprising the steps of: 

(1 ) obtaining a computerised representation of the three-dimensional structure of a binding site on the surface 
of a biological macromolecule; 

(2) generating a computerised model of the functional structure of said binding site which may be used to 
identify favourable and unfavourable Interactions between the binding site and a drug candidate molecule; 



10 



(3) identifying a molecular fragment capable of placement within said binding site and capable of carrying at 
least one substituent group, said molecular fragment either being capable of being synthesized from reagent 
compounds accessible in substituted form whereby to import said substituent groups on synthesis of said 
molecular fragment or being present in an accessible reagent compound capable of substitution with said 

15 substituent groups by reaction with further accessible reagent compounds; 

(4) generating a set of lists of accessible reagent compounds, the lists being such that a combination of com- 
pounds taken from each list may be reacted to produce a candidate compound comprising said molecular 
fragment carrying a plurality of substituent groups thereby generating a first virtual library of car^iidate com- 

20 pounds being the theoretical set of compounds producible by reaction of the members of said lists, each 

member of each list conr^rising a component comnxxi to the other members of that list and a component 
unique within that list; 

(5) for each said list limiting the number of members thereof using a first set of exclusion rules thereby to 
25 generate a restricted second virtual library of candidate compounds, the operation of said first set of njles 

involving for each member of each list computerised corrtparison for favourable or unfavourable interactions 
between said computerised rrtodel and a structure comprising said molecular fragment and a substituent de- 
riving from the unique component within said list of that member, the rnotecutar fragment and the computerised 
model being held in fixed spatial relationship to each other for said comparison; 

30 

• (6) evaluating and ranking by computer the members of said second virtual library for favourable and unfa- 
vourable interactions with said computerised model and thereby generating a restricted third virtual library of 
candidate compourtds ranked as having favourable interactions; 

35 (7) optionally, selecting fi-om said third virtual library at least one further molecular fragment and repeating 

steps (4). (5) and (6) to generate an alternative third virtual library; 

(8) screening said third virtual library using a second set of exclusion rules thereby to generate a restricted 
fourth virtual library of candklate compounds comprising compounds which are candidates for synthesis and 

40 experimental evaluation for drug efficacy; 

(9) synthesizing some or all candidate conrtpounds of said fourth virtuaj library to produce a candidate com- 
pound library;. 

45 (10) experimentally evaluating the compounds of said candklate compound library for drug efficacy; 

(11 ) analysing the experimental efficacy data generated in step (10) for stmcture-activity retationship informa- 
tion; 

50 (12) using the infonDation derived in step ( 11 ) selecting a revised set of lists of accessible reagent compounds. 

said lists being expanded to include selected reagents not present in the restricted lists generated in step (5) 
and optkxially re8trk:ted to exclude selected reagents present in the restricted lists generated in step (5); 

(1 3) repeating steps (6) and (7) to identify further compounds which are candidates for synthesis and exper- 
ts imental evaluation for drug efficacy; 

(14) synthesising and experimentally evaluating said further compounds for dnjg efficacy; 
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(15) if required repeating steps (11) to (14) one or more times; 

(16) identifying as a lead candidate a compound synthesized and experimentally evaluated as above. 

2. A process according to claim 1 wherein in step (7) at least one further molecular fragment is selected from the 
third virtual library, whereafter steps (4), (5) and (6) are repeated to generate an alternative third virtual library 
which is subsequently screened in step (8). 

3. A process according to either of claims 1 and 2 wherein in step (12) said revised set of lists of accessible reagents 
is selected to include reagents excluded from the restricted lists generated in step (5) for being analogs of reagents 
included in said restricted lists. 

4. A process according to any one of claims 1 to 3 wherein In step (12) said revised set of lists of accessible reagents 
Is selected to include reagents excluded from the restricted lists generated in step (5) for involving complex trans- 
formation in their synthesis from comniercially available reagents. 

5. A process according to any one of claims 1 to 4 wherein in step (1 2) said revised set of lists of accessible reagents 
is selected to include reagents excluded from the restricted lists generated in step (5) for being produced in tow 
yield in their^synthesis from commercially available reagents. 

6. A process according to any one of claims 1 to 5 wherein in step (1 2) said revised set oMtsts of accessible reagents 
is selected to delude reagents excluded from the restricted lists generated In step (5) for requiring significant 
purification following their synthesis from commercially available reagents. 

7. A process according to any one of claims 1 to 6 wherein in step (1 2) said revised set of lists of accessible reagents 
is selected to include reagents excluded from the restricted lists generated in stepJS) for being expensive. 

8. A process according to any one of claims 1 to 7 wherein in step (1) said representation is derived from X-ray 
crystallographic data for said macronx)lecule. 

9. A process according to any one of clainns 1 to 8 wherein in step (4) said lists of accessible reagents are generated 
from a computer database of available chemicals. 

10. A process according to claim 9 wherein in step (4) said lists of accessible reagents are supplemented to include 
reagents accessible by transformation of reagents identified from said database. 

11. A process according to any one of claims 1 to 10 wherein the nxxiel generated in step (2) comprises a represen- 
tation of those regions of the binding site capable of interactbn with a nrKDiecule placed in said binding site the said 
regions being identified according to the nature and geometry of said interaction. 

12. A process according to any one of claims 1 to 11 wherein in step (5) said computerised comparison for a reagent 
involves in sequence: (i) carrying out a subgraph isorrxxphism check to establish a nnatch between said unique 
component of said reagent and said computerised rrxxlel, (ii) rejecting reagents for which no nnatch can be found, 
(tii) verifying the nr^tch for non-rejected reagents by torsional optimization of; the rotatable bonds in the unique 
component, (iv) calculating the cortijpatibillty between the cornputerised rnodet and a structure comprising the 
molecular fragment artd the substituent deriving from the unique component of the reagent tri the confinmtion 
predicted by step (ifi), (v) optionally repeating steps (iiij and (iv) to seek a conformation with enhanced compaitibiiity 
(vi) rejecting reagents for whk:h a preselected degree of compatibility is ncd found in steps (iv) and (v). (vii) deter- 
mining a score indicative of a minimum energy level for said structure within ssid computerised model with the 
structure and positk>n of said nriotecuiar fragment heM constant, and (viii) ranking the reagents in a list according 
to the scores determined in step (vii). 

13. A process according to claim 12 wherein in step (vii) scores indicative of strain energy and contributions to energy 
level of individual interactbns of components of sakl structure with said cornputerised rnodet are also determined 
and reagents are rejected if such scores exceed pre-selected limits indicative of undesirable conformatkx) or in- 
teraction. 

14. Novel active compounds kientified by a process according to any one of claims 1 to 1 3. 
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15. A method of manufacturing a drug substance, said method comprising the steps of: 

(1 ) obtaining a computerised representation ot the three-dimensional structure of a binding site on the surface 
of a biological macromotecule; 

(2) generating a computerised model of the functional structure of said binding site which may be used to 
identify favourable and unfavourable interactions between the binding site and a drug candidate molecule; 

(3) identifying a molecular fragment capable of placement within said binding site and capable of carrying at 
least one substituent group, said noolecular fragment either being capable of being synthesized from reagent 
compounds accessible in substituted form whereby to import said substituent groups on synthesis of said 
molecular fragment or being present in an accessible reagent compound capable of substitutbn with said 
substituent groups by reaction with further accessible reagent compounds; 

(4) generating a set of lists of accessible reagent compounds, the lists being such that a combination of com- 
pounds taken from each list may be reacted to produce a candidate compound comprising said molecular 
fragment carrying a plurality of substituent groups thereby generating a first virtual library of candidate conrv 
pounds being the theoretical set of compounds producible by reaction of the members of said lists, each 
menriber of each list comprising a component comrtKjn to the other members of that list and a component 
unique within that list; 

(5) for each said list limiting the number of members thereof using a first set of exclusion rules thereby to 
generate a restricted second virtual library of candidate compounds, the operation of said first set of njles 
involving for each member of each list computerised comparison for favourable or unfavourable interactions 
between said computerised model and a structure comprising said molecular fragment arKf a substituent de- 
riving from the unique component within said list of that member, the molecular fragment and the computerised 
model being held in fixed spatial relationship to each other for said comparison; 

(6) evaluating and ranking by computer the members of said second virtual library for favourable and unfa- 
vourable interactions with said computerised model and thereby generating a restricted third virtual library of 
candidate compounds rariked as having favourable interactions; 

(7) optionally, selecting from said third virtual library at least one further molecular fragment and repeating 
steps (4), (5) and (6) to generate an alternative third virtual library; 

(8) screening said third virtual library using a second set of exclusion mtes thereby to generate a restricted 
fourth virtual library of candidate compounds comprising compounds which are candidates for synthesis artd 
experimental evaluation fQr drug efficacy; 

(9) synthesizing some or all candidate compounds of said fourth virtual library to produce a candidate com- 
pound library; 

(10) . experimentally evaluating the compounds of said candidate compour^d library for drug efficacy; 

(11) analysing the experimental efficacy data generated in step (10) for structure-acttvity relationship Infomna* 
tion; 

(12) using the informatkxi derived in step (11 ) selecting a revised set of lists of accessible reagent compounds, 
said lists being expanded to riclude selected reagents not present In the restricted lists generated In step (5) 
and optionally restricted to exclude selected reagents present in the restricted lists generated in step (5); 

(1 3) repeating steps (6) and (7) to identify further compounds which are candidates foir synthesis and exper- 
imental evaluation for drug efficacy; 

(14) synthesising and experimentally evaluating sakl further compounds for drug efficacy; 

( 1 5) if required repeating steps (11 ) to (1 4) one or more times; 
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(16) identifying as a lead candidate a compound synthesized and experlrhentalty evaluated as above; 

(17) manufacturing the compound identified in step (16) above; and, optionally, 

(18) admixing the compound manufactured in step (17) above with at least one pharmaceutically acceptable 
carrier or excipient. 
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